Model-Free Trajectory Optimization for Reinforcement Learning
نویسندگان
چکیده
Many of the recent Trajectory Optimization algorithms alternate between local approximation of the dynamics and conservative policy update. However, linearly approximating the dynamics in order to derive the new policy can bias the update and prevent convergence to the optimal policy. In this article, we propose a new model-free algorithm that backpropagates a local quadratic time-dependent Q-Function, allowing the derivation of the policy update in closed form. Our policy update ensures exact KL-constraint satisfaction without simplifying assumptions on the system dynamics demonstrating improved performance in comparison to related Trajectory Optimization algorithms linearizing the dynamics.
منابع مشابه
Model Based Reinforcement Learning with Final Time Horizon Optimization
We present one of the first algorithms on model based reinforcement learning and trajectory optimization with free final time horizon. Grounded on the optimal control theory and Dynamic Programming, we derive a set of backward differential equations that propagate the value function and provide the optimal control policy and the optimal time horizon. The resulting policy generalizes previous re...
متن کاملScalable Reinforcement Learning via Trajectory Optimization and Approximate Gaussian Process Regression
Over the last decade, reinforcement learning (RL) has begun to be successfully applied to robotics and autonomous systems. While model-free RL has demonstrated promising results [1, 2, 3], it requires human expert demonstrations and relies on lots of direct interactions with the physical systems. In contrast, model-based RL was developed to address the issue of sample inefficiency by learning d...
متن کاملRobust Trajectory Optimization: A Cooperative Stochastic Game Theoretic Approach
We present a novel trajectory optimization framework to address the issue of robustness, scalability and efficiency in optimal control and reinforcement learning. Based on prior work in Cooperative Stochastic Differential Game (CSDG) theory, our method performs local trajectory optimization using cooperative controllers. The resulting framework is called Cooperative Game-Differential Dynamic Pr...
متن کاملUniversal Planning Networks
A key challenge in complex visuomotor control is learning abstract representations that are effective for specifying goals, planning, and generalization. To this end, we introduce universal planning networks (UPN). UPNs embed differentiable planning within a goal-directed policy. This planning computation unrolls a forward model in a latent space and infers an optimal action plan through gradie...
متن کاملUsing trajectory data to improve bayesian optimization for reinforcement learning
Recently, Bayesian Optimization (BO) has been used to successfully optimize parametric policies in several challenging Reinforcement Learning (RL) applications. BO is attractive for this problem because it exploits Bayesian prior information about the expected return and exploits this knowledge to select new policies to execute. Effectively, the BO framework for policy search addresses the expl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016